Improving Simple Bayes

نویسندگان

  • Ron Kohavi
  • Barry Becker
  • Dan Sommer
چکیده

The simple Bayesian classi er (SBC), sometimes called Naive-Bayes, is built based on a conditional independence model of each attribute given the class. The model was previously shown to be surprisingly robust to obvious violations of this independence assumption, yielding accurate classi cation models even when there are clear conditional dependencies. We examine di erent approaches for handling unknowns and zero counts when estimating probabilities. Large scale experiments on 37 datasets were conducted to determine the e ects of these approaches and several interesting insights are given, including a new variant of the Laplace estimator that outperforms other methods for dealing with zero counts. Using the bias-variance decomposition [15, 10], we show that while the SBC has performed well on common benchmark datasets, its accuracy will not scale up as the dataset sizes grow. Even with these limitations in mind, the SBC can serve as an excellent tool for initial exploratory data analysis, especially when coupled with a visualizer that makes its structure comprehensible.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bayes Estimation for a Simple Step-stress Model with Type-I Censored Data from the Geometric Distribution

This paper focuses on a Bayes inference model for a simple step-stress life test using Type-I censored sample in a discrete set-up. Assuming the failure times at each stress level are geometrically distributed, the Bayes estimation problem of the parameters of interest is investigated in the both of point and interval approaches. To derive the Bayesian point estimators, some various balanced lo...

متن کامل

Improving spam filtering by combining Naive Bayes with simple k-nearest neighbor searches

Using naive Bayes for email classification has become very popular within the last few months. They are quite easy to implement and very efficient. In this paper we want to present empirical results of email classification using a combination of naive Bayes and k-nearest neighbor searches. Using this technique we show that the accuracy of a Bayes filter can be improved slightly for a high numbe...

متن کامل

Improving on the Naïve Bayes Document Classifier

The Naïve Bayes document classifier has been used in many document classification algorithms [1], but is only really useful on a small subset of documents due to it’s many shortcomings [2]. By augmenting the basic functionality of the simple Naïve Bayes classifier, the classification algorithm can be applied to a much wider range of documents. This paper investigates the advantages which can be...

متن کامل

Fast and Accurate Sentiment Classification Using an Enhanced Naive Bayes Model

We have explored different methods of improving the accuracy of a Naive Bayes classifier for sentiment analysis. We observed that a combination of methods like effective negation handling, word n-grams and feature selection by mutual information results in a significant improvement in accuracy. This implies that a highly accurate and fast sentiment classifier can be built using a simple Naive B...

متن کامل

Techniques for Improving the Performance of Naive Bayes for Text Classification

Naive Bayes is often used in text classification applications and experiments because of its simplicity and effectiveness. However, its performance is often degraded because it does not model text well, and by inappropriate feature selection and the lack of reliable confidence scores. We address these problems and show that they can be solved by some simple corrections. We demonstrate that our ...

متن کامل

Improving Classification in Bayesian Networks using Structural Learning

Naïve Bayes classifiers are simple probabilistic classifiers. Classification extracts patterns by using data file with a set of labeled training examples and is currently one of the most significant areas in data mining. However, Naïve Bayes assumes the independence among the features. Structural learning among the features thus helps in the classification problem. In this study, the use of str...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997